Text mining: intermediate forms on knowledge representation
نویسندگان
چکیده
In this paper we review the main intermediate forms proposed in text mining, and we briefly study some fuzzy counterparts. The concept of intermediate form applies to any knowledge representation employed to represent in a structured way the semantic content of a text corpus. Intermediate forms play a central role in the text mining process since it is necessary to transform plain text into a form in order to apply mining techniques. Since the semantics of text use to be imprecise, the use of fuzzy intermediate forms seems to be a natural solution in many cases. We discuss about fuzzy intermediate forms and the corresponding fuzzy text mining techniques that may be applicable on them.
منابع مشابه
خوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملText Mining: Promises and Challenges
Text mining, also known as knowledge discovery from text, and document information mining, refers to the process of extracting interesting patterns from very large text corpus for the purposes of discovering knowledge. Text mining is an interdisciplinary field involving information retrieval, text understanding, information extraction, clustering, categorization, visualization, database technol...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملFrom Lexical Semantics to Text Analysis
1 Motivation One of the major challenges today is coping with an overabundance of potentially important information. With newspapers such as the Wall Street Journal available electronically as a large text data base, the analysis of natural language texts for the purpose of information retrieval has found renewed interest. Knowledge extraction and knowledge detection in large text databases are...
متن کاملRepresenting Documents via Latent Keyphrase Inference
Many text mining approaches adopt bag-of-words or n-grams models to represent documents. Looking beyond just the words, i.e., the explicit surface forms, in a document can improve a computer's understanding of text. Being aware of this, researchers have proposed concept-based models that rely on a human-curated knowledge base to incorporate other related concepts in the document representation....
متن کامل